构建可用的无线电监控自动语音识别(ASR)系统是资源不足的语言的一项挑战性任务,但这在广播是公众沟通和讨论的主要媒介的社会中至关重要。联合国在乌干达的最初努力证明了如何理解被社交媒体排除在社交媒体中的农村人的看法在国家规划中很重要。但是,由于缺乏转录的语音数据集,这些努力正受到挑战。在本文中,Makerere人工智能研究实验室发布了155小时的Luganda Radio演讲语料库。据我们所知,这是撒哈拉以南非洲第一个公开可用的广播数据集。本文描述了语音语料库的开发,并使用开源语音识别工具包Coqui STT Toolkit提出了基线Luganda ASR绩效结果。
translated by 谷歌翻译
口语内容中的话语码切换(CS)的普及性具有强制ASR系统来处理混合输入。然而,设计CS-ASR具有许多挑战,主要原因是数据稀缺,语法结构复杂性和不匹配以及不平衡的语言使用分配。最近的ASR研究表明E2E-ASR使用多语种数据来处理CS现象的少量CS数据。但是,对CS数据的依赖仍然存在。在这项工作中,我们提出了一种方法来增加用于人工生成的CS文本的单格式数据以改善不同的语音模块。我们在利用对齐的转换对的同时基于对等效约束理论的方法,以生成语法有效的CS内容。我们的经验结果表明,两种生态和嘈杂的CS测试集,在困惑中的相对增益为29-34%,而在WER中约为2%。最后,人类评估表明,人类可以获得83.8%的生成数据。
translated by 谷歌翻译
深度神经网络已经显示出使用医学图像数据的疾病检测和分类结果。然而,他们仍然遭受处理真实世界场景的挑战,特别是可靠地检测分配(OOD)样本。我们提出了一种方法来强化皮肤和疟疾样本的ood样本,而无需在训练期间获得标记的OOD样品。具体而言,我们使用度量学习以及Logistic回归来强制深度网络学习众多丰富的类代表功能。要指导对OOD示例的学习过程,我们通过删除图像或置换图像部件中的类特定的突出区域并远离分布式样本来生成ID类似的示例。在推理时间期间,用于检测分布外样品的K +互易邻居。对于皮肤癌ood检测,我们使用两个标准基准皮肤癌症ISIC数据集AS ID,六种不同的数据集具有不同难度水平的数据集被视为出于分配。对于疟疾检测,我们使用BBBC041 Malaria DataSet作为ID和五个不同的具有挑战性的数据集,如分销。我们在先前的先前皮肤癌和疟疾OOD检测中,我们在TNR @ TPR95%中提高了最先进的结果,改善了5%和4%。
translated by 谷歌翻译
Existing automated techniques for software documentation typically attempt to reason between two main sources of information: code and natural language. However, this reasoning process is often complicated by the lexical gap between more abstract natural language and more structured programming languages. One potential bridge for this gap is the Graphical User Interface (GUI), as GUIs inherently encode salient information about underlying program functionality into rich, pixel-based data representations. This paper offers one of the first comprehensive empirical investigations into the connection between GUIs and functional, natural language descriptions of software. First, we collect, analyze, and open source a large dataset of functional GUI descriptions consisting of 45,998 descriptions for 10,204 screenshots from popular Android applications. The descriptions were obtained from human labelers and underwent several quality control mechanisms. To gain insight into the representational potential of GUIs, we investigate the ability of four Neural Image Captioning models to predict natural language descriptions of varying granularity when provided a screenshot as input. We evaluate these models quantitatively, using common machine translation metrics, and qualitatively through a large-scale user study. Finally, we offer learned lessons and a discussion of the potential shown by multimodal models to enhance future techniques for automated software documentation.
translated by 谷歌翻译
In this paper, we reduce the complexity of approximating the correlation clustering problem from $O(m\times\left( 2+ \alpha (G) \right)+n)$ to $O(m+n)$ for any given value of $\varepsilon$ for a complete signed graph with $n$ vertices and $m$ positive edges where $\alpha(G)$ is the arboricity of the graph. Our approach gives the same output as the original algorithm and makes it possible to implement the algorithm in a full dynamic setting where edge sign flipping and vertex addition/removal are allowed. Constructing this index costs $O(m)$ memory and $O(m\times\alpha(G))$ time. We also studied the structural properties of the non-agreement measure used in the approximation algorithm. The theoretical results are accompanied by a full set of experiments concerning seven real-world graphs. These results shows superiority of our index-based algorithm to the non-index one by a decrease of %34 in time on average.
translated by 谷歌翻译
This paper proposes a novel self-supervised based Cut-and-Paste GAN to perform foreground object segmentation and generate realistic composite images without manual annotations. We accomplish this goal by a simple yet effective self-supervised approach coupled with the U-Net based discriminator. The proposed method extends the ability of the standard discriminators to learn not only the global data representations via classification (real/fake) but also learn semantic and structural information through pseudo labels created using the self-supervised task. The proposed method empowers the generator to create meaningful masks by forcing it to learn informative per-pixel as well as global image feedback from the discriminator. Our experiments demonstrate that our proposed method significantly outperforms the state-of-the-art methods on the standard benchmark datasets.
translated by 谷歌翻译
Machine learning models are typically evaluated by computing similarity with reference annotations and trained by maximizing similarity with such. Especially in the bio-medical domain, annotations are subjective and suffer from low inter- and intra-rater reliability. Since annotations only reflect the annotation entity's interpretation of the real world, this can lead to sub-optimal predictions even though the model achieves high similarity scores. Here, the theoretical concept of Peak Ground Truth (PGT) is introduced. PGT marks the point beyond which an increase in similarity with the reference annotation stops translating to better Real World Model Performance (RWMP). Additionally, a quantitative technique to approximate PGT by computing inter- and intra-rater reliability is proposed. Finally, three categories of PGT-aware strategies to evaluate and improve model performance are reviewed.
translated by 谷歌翻译
Finding and localizing the conceptual changes in two scenes in terms of the presence or removal of objects in two images belonging to the same scene at different times in special care applications is of great significance. This is mainly due to the fact that addition or removal of important objects for some environments can be harmful. As a result, there is a need to design a program that locates these differences using machine vision. The most important challenge of this problem is the change in lighting conditions and the presence of shadows in the scene. Therefore, the proposed methods must be resistant to these challenges. In this article, a method based on deep convolutional neural networks using transfer learning is introduced, which is trained with an intelligent data synthesis process. The results of this method are tested and presented on the dataset provided for this purpose. It is shown that the presented method is more efficient than other methods and can be used in a variety of real industrial environments.
translated by 谷歌翻译
Simulation-based falsification is a practical testing method to increase confidence that the system will meet safety requirements. Because full-fidelity simulations can be computationally demanding, we investigate the use of simulators with different levels of fidelity. As a first step, we express the overall safety specification in terms of environmental parameters and structure this safety specification as an optimization problem. We propose a multi-fidelity falsification framework using Bayesian optimization, which is able to determine at which level of fidelity we should conduct a safety evaluation in addition to finding possible instances from the environment that cause the system to fail. This method allows us to automatically switch between inexpensive, inaccurate information from a low-fidelity simulator and expensive, accurate information from a high-fidelity simulator in a cost-effective way. Our experiments on various environments in simulation demonstrate that multi-fidelity Bayesian optimization has falsification performance comparable to single-fidelity Bayesian optimization but with much lower cost.
translated by 谷歌翻译
Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.
translated by 谷歌翻译